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Abstract 

Since the consistency of maximum likelihood estimator has been proved, the 
only problem which is left is the problem of optimization. In last century, it 
was found that some splines were very useful. From Stone- Weirstr ass theo- 
rem, we can approximate continuous functions by the polynomials and hence 
we can construct the estimator by using the splines. Therefore, it might not 
be so important to stress the difference between the parametric approach and 
nonparametric approach. Usually, a nonlinear optimization problem is not so 
easy to solve and it is assumed that the optimization problem can be solved 
by existent packages. From the view point of mathematics, the results of the 
optimization problem should be verified or reinvestigated because the nonlinear 
optimization problem is not simple. It seems that the nonlinear optimization 
play an important role in density estimation. Though nonlinear equations must 
be solved in most optimization problems, we will show how a optimization prob- 
lem can be solved by finding the solution of systems of linear equations. Basing 
on this approach, the optimization problem can be solved by solving a quadratic 
equation finally. Some numerical examples are studied as well. From the figures, 
it can be found this is a good approach on density function estimation. 

Introduction 

The problem of density estimation is to estimate the density function / by 
a set of observations, xi, X2,. . . , Xm- Roughly, the function with parameters, 
denoted by the notation/ (a;, 6*), is called the estimator of /. We assume that 
there is a family of functions, say 

3 = {/(.T,0);0ei?"}, (1) 
and / G The likelihood function / is defined 

m 

From the work of the statistician, the information of / can be obtained by 
maximizing the likelihood function of the density estimator [1]. Usually, it is not 
a simple work to solve the nonlinear equations. So far, we know how to solve 
a single linear equation, a system of linear equations and a single quadratic 
equation. In this paper, the optimization problem is transformed to system 
of linear equations first. Basing on the approach, the optimization problem is 
transformed to a single quadratic equation. Finally, we can solve the optimiza- 
tion problem effectively. The work of transformation is not so simple though 
the idea is simple. Besides, the undesired roughness of nonparametric estimator 
is a serious problem. Since our approach is expected to estimate the density 
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function of general cases, this serious problem must be studied in the same time. 
In the middle of 20*'' century, several splines were studied. There are many ap- 
plications of these splines such as computer aid design of cars [2] , curve fitting 
in statistics, computation of energy levels of multi-electron atoms etc [3] . These 
splines were introduced to diminish the oscillations of the curve which is ob- 
tained by the method of traditional polynomial curve fitting. Therefore, these 
splines can remove the roughness of density estimators. It is possible to solve 
these problems in the same time. Anyone who knows the elementary calculus 
[4] or second year calculus [5] is able to understand this paper. 
Parzen windows 

In order to avoid the difficulty of the nonlinear optimization problem, the 
orthogonal polynomials are used in most nonparamctric methods. It seems 
that the orthogonal function will introduce more roughness. In order to avoid 
introducing the roughness, the orthonormal basis is abandoned and the Parzen 
window functions [6], nonnegative functions, are adopted. Let S be the Dirac 
delta function. The Dirac delta function is a generalized function, 

6{x) = (3) 

when X 7^ 0, and 

/•OC 

S{x) = 1. (4) 

If it is necessary, then we shall consider the Dirac delta function as a linear 
functional defined on a function space [7]. Intuitively, we can start from the 
following identity 

fix) = Jsix- t)f{t)dt. (5) 

Here, / is the probability density function which will be estimated by ob- 
servations xi, X2, x^,. . . , Xm- Let / be the estimator of /. If the integration of 
([5|) can be approximated by summation, then the estimator is 

n 

fix) =^c,ipi{x), (6) 
1=1 

^^ix) > 0. (7) 

Usually, ipi are called window functions or kernel functions. It seems that 
([T])-© can be ignored. We can start from the estimator which is defined in 
([6|). If we can determine value of a properly, then the estimator is obtained. 
Though there are many window functions which are available [2] [3], we find 
that Bernstein polynomial is good a candidate. Clearly, it must be that 

fix)dx = 1. (8) 
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Let 



Pi = y Vi{x)dx. (9) 

Let I be defined 

m 

z=n/fe)- (10) 

Here, I is the likelihood function. From the works of statisticians, the value 
of Cj can be determined by maximizing the likelihood function [1] . The problem 
is to maximize / subjected to the constraints, 

n 

^PiCi = 1, (11) 

and 

0<c„ z = l,2,...,n. (12) 
Mathematically, since Cj are going to be determined, if we redefine /, 

n 

f{x)=Y,{Ci/p^)'P^{x), (13) 
i=l 

Then the constraints become 

n 

J2ci = l (14) 

and 



0<Ci, i = l,2, ...,n. (15) 

Generally, this problem should be solved by Kuhn- Tucker Theorem [8] . Like 

many mathematical theorems, both Kuhn- Tucker and Lagrange theories are 
not constructive and the non-constructive results can be traced back to the last 
axiom of real number, axiom of completeness [4]. In physics, the orthogonal 
functions arc very useful. In order to solve the nonlinear optimization of density 
estimation, the orthogonal functions were adopted by nonparametric approach. 
Quantum mechanics was discussed in the paper of Good and Gaskins 1971 [9] 
[10]. Since then most, if not all, nonparametric density estimators were built on 
the orthogonal functions which were inferred from quantum mechanics directly 
or indirectly. Let F be a vector space over the field of complex numbers. Let 
Vi, f2,. • • , Vn be orthonormal basis of V. Let 2; be a unit vector of V. Roughly 
" 2 

speaking, U z = J2 '^i'^i' ^^^^^ \ci\ is interpreted as the probability that z 
might be Vi in quantum theory. Hence Cj is called the probability amplitude. In 



3 



statistics, the real numbers work well. Therefore, the complex field is replaced by 
the real field. The difference between the probability density and the probability 
amplitude is clear and simple. Mathematically or symbolically, the symbol Cj is 
replaced by cf. Obviously, the constraints become 



= 1. (16) 



Only one constraint is left. Then the Lagrange's multiplier technique can be 
applied easily. Clearly, optimization problem play an important role in density 
function estimation. In order to avoid the difficulties, we will follow the approach 
of quantum theory and the concept of the probability amplitude is adopted. But 
we will use the result of Stone- Weirstrass theorem instead of the orthogonal 
functions 

Optimization on the compact manifold 

Since the likelihood function I is a C°° function of Ci defined on the compact 
subset of i?", / must has maximum value on the sphere. Let observations be xi, 



Let 



Let 



Let 



i=l 



I ^ III J. (18) 



c^Ci = r (19) 

be the constraint. Now, we start to solve the optimization problem. 
Let 



Let 



ay = 'fi{xj). (20) 



= /-A(^Qc,-r). (21) 



By the method of Lagrange's multiplier, we have 

dh 



From HT])- (HH), we get 
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ZjZ^T^-^Cfc = 0, fc = l,2,...,n. (23) 

j=i 1 

Multiplying (|23p by and taking the summation of index k, we get 

n m 

E('E^-^^^K- = 0' (24) 

fc=i j=i ^ 

Interchanging the summations, we get 



rn n 



E('E^^-^c^C'=) = 0- (25) 

j=i fe=i -J 

From jni), (HID, (HH), (EOH and (HH), we get 

lm~\r^ 0. (26) 

Hence 

I = \r/m. (27) 
Substituting ^ into ([231), we get 



^fc(E^-l)^0' fc=l,2,...,n. (28) 

Clearly, either 

Ck = 0, (29) 



or 



(E^-1) = 0' fc = l,2,...,n. (30) 

i=i 

It should be emphasized that ([^5]) and ([50)1 are not mutually exclusive. In 
order to linearization the equations (j30[) , we take some transformations of vari- 
ables. 

Let 

yj=r/{mlj). (31) 
Substituting (|^ into ([50]) . we get 

m 

E akjVj = 1 (32) 
Multiplying both side ([5^ by a constant 6*^, we have 
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Y,ak,B'y,=d\ (33) 

i=i 

Since the solutions of ([M]) are not affected by any factor, the constant 9 is 
introduced in equations ([5^ to fit the constraint. It seems that the constant 
is a redundancy because that Q must be 1. Later, it wiU be found in following 
lemmas and theorems that Q play an important role. 

Let 



y^^B'y,. (34) 
Substituting (l34|) into ([33|) . we get 

^akjyj=9'^. fc = l,2,...,n (35) 

Clearly, equations psp is a system of linear equations of yj . 
From (|3ip. we have 

r 

toL- = — . (36) 
From llTl), dlOl), dSll) and dsn), we have 

Va„c^ = . (37) 

And hence 



= j = l,2,...,m. (38) 

Clearly, equations p8|) are also linear equations of {ci/OY. Equations (|30p 
are replaced by two system of linear equations, ([35|l and ([38]) . The problem 
seems to be very simple. Actually, there are many combinations of p9p and 
(j30p .. Though, in these combinations, some of them may not yield the solu- 
tions of this optimization problem, all the feasible solutions of this problem are 
contained in the suitable combinations of equations (|29p and equations (|30p . 
If we solve the problem directly, then there will be the same complexities as 
the simplex method for solving linear programming problem. Furthermore, it is 
very difficulty to design the algorithm and to implement the computer program 
if it is not impossible. Even if the computer program is designed, then it might 
be a time-consuming program. However, it can be concluded that the nonlinear 
optimization problem is solvable theoretically. If the numbers m and n are very 
small, say 3, then it is a simple problem to solve the systems of linear equations. 
Generally, the extreme point of likelihood function is not unique. The results of 
computer simulation show that the extreme point of likelihood function seems 
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to be unique. The computer simulations are implemented when m and n are 
less than 10. 

Quantum theory approach 

In quantum mechanics, the wave function is linear combination of basis 
functions and the normalization of the wave function requires that the sum of 
the squares of coefficients should be unit. This gives us a clue to remodel our 
problem and the problem becomes easier. 

Let 

n 

J{x) = ^ UiVr(fi{x), (39) 
i=l 

where ^pi is the window function. 
Let 

m 

~l = \lJ[x,) (40) 
be the likelihood function. The constraints are 

n 

UiUi = r, (41) 

i=l 

and 



^ v,v^ = r. (42) 



The problem is to maximize / subjected to constraints (j4T|) and ([42|) . In order 
to find the connection of two models, we should define the following notations. 
Let 

n 

^ CiCi = r. (43) 

i=l 

Let 

n 
i=l 

Let 

l = X{h. (45) 

Let 
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5'n = {(ci, ...,c„); V'ciCi = r}. (46) 
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The first model is to find tlie extreme point of I on 5„ 
Let 



y-iUi = r. (47) 



Let 



Let 



Let 



Let 



E 

2=1 



= r. (48) 



I J = ^MiWi(/3i(a;j). (49) 



i^Wh (50) 



S'2„ = {(wi, ...,M„,wi, ...,w„);^UiU,; = r, ^UiWi = r}. (51) 

i=l 1=1 

Tlie second model is to find the extreme point of I on S2n- It is clear that 
Sn and 5*271 are compact subsets of i?" and i?^" respectively. Let / and I be the 
likelihood functions defined above. Clearly, both I and I have maximum. Let A 
be the set of all I. Let A be the set of all I. It is obvious that A C A. Therefore, 
the maximum of A is less than or equal to that of A. It will be shown, in theorem 
1, that the extreme points of I should be located at the points such that Ui = Vi, 
i = 1, 2, n. Therefore, the problem to maximize I subjected to the constraint 
(|43)) is equivalent to that of maximizing I subjected to the constraints (jlT)) and 

dm). 

Theorem 1. For each observation Xj, if there is (^i such that fiixj) > 0, 
then the extreme points of I should be located at the points such that Ui = Vi, 
i = 1,2, n. 

Proof. We assume that 

vi > ui (52) 

and the maximum is Im, that is, 
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for all I. Let 



hi > I 



(53) 



M- = = O^JUiVi. 

By choosing a proper value 0, constraints 



(54) 



n 



(55) 



and 



n 




= r 



(56) 



are satisfied simultaneously. By Cauchy-Schwartz inequality and (|52p. we get 



for some I. This is a contradiction. 
The iteration procedures 

Tough we have stated and proved Theorem 1, we need a constructive proce- 
dure to find the extreme point. It is not so easy to solve the nonlinear optimiza- 
tion problem. Usually, the sequences are constructed by iteration procedures. 
The well designed iteration procedures can generate monotonic sequences which 
are useful in theory and application. With the nested iteration procedures, the 
complicated problems such as mathematical formulation, designing of the com- 
putation algorithm and the computer programming can be solved in parallel. 
It seems that it is easier to maximize I than to maximize I. The reason why we 
solve the more complicated problem can be shown in the method of Lagrange's 
multiplier. The strategy of solving the nonlinear optimization problem with 
constraints is ignoring one of the constraints, say equation (|48p . This can be 
done by choosing the initial value of w^, Vi = \J r/n. Then the optimization 
problem becomes simpler because only one constraint is left. In order make it 
more clearly and precisely, we recall and define some identities. 



e > 1 



(57) 



and hence we have 



hi < I 



(58) 



Let 



ViLpi{x). 



(59) 



Let 



n 




(60) 
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Let 

~lj=J{x,). (61) 

Let 

m 

l=l[lj. (62) 

3 = 1 

The problem is to maximize I subjected to the constraint (|47)) . After the 
values of Ui being obtained, the value of Vi is updated by O^UiVi. By choosing 
the factor 9, the constraint ([^5]) is satisfied. Clearly, the iteration procedures 
can be obtained. And the Cauchy- Schwartz inequality is able to test the termi- 
nation of the iteration procedures. First, we summarize the whole procedures. 
Later, the associated mathematical theory of the procedure will be shown. The 
procedures are: 

Step (i). Initialize the procedure by setting fc — 1 and ~ \frjn^i = 
l,2,...,n. 

Step (ii), maximize I subjected to the constraint ^J^. Then values of 
u'^ ,i — 1,2, ...,n, are obtained. 

n 

Step (iii). Check the condition '^u'^v'^ + e > r is satisfied or not, where e 

i— 

is a small positive number to control the termination of the procedures. 

If the condition is satisfied, then stop the iteration procedures and the density 

n 

estimator, f{x) — ^ u^v^'.pi{x), is obtained. Otherwise, increase the value of 
1=1 

k by one, set uf — \J u^^^vf^^ , here 9 is a factor to fit the constraint ^JS^. 
Then go to Step (ii) and proceed the procedures. 



Remark 1. From Cauchy- Schwartz inequality, the values of 9 must be 
greater than or equal to 1 and hence the set of the values of the likelihood function 
is an increasing sequence. 

Since step (i) and step (iii) are so simple, the only problem which is left is 
how to complete the step (ii). Now, we will show how step (ii) can work well. 
In order to complete the step(ii), another nested iteration procedures will be 
designed and studied. In order to collaborate with the computer algorithm, new 
notations must be introduced. Let M*^and v'^ be obtained in the fc*'' iteration. 
Let 



/(x) = ^^i,'=(^;,V,(x)). (63) 

i=l 

Let 

U^)=v'yv,{x). (64) 

The simple notation, 
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f{x)^^Uiil;i{x) (65) 
1=1 

shall be used hereafter. 

The constructive proof and the procedures of optimization 

Lemma 1. Let f{x) ~ ^ Uiipi{x), where ipi are nonnegative functions. Let 

1=1 

Ij — f{xj). Let I ^ Y[ t>e the likelihood function. For each Xj, there is a 

ipi such that ipi{xj) > O.Then there are constructive procedures to maximize I 
subjected to the constraint. We recall the constraint (|Tf)) 

n 

UiUi = r. 

1=1 

Remark 2: Since ipi are nonnegative functions and the constraint is in- 
variant under the transformation, Ui — —Ui, the solution of this optimization 
problem, Ui, must be nonnegative. 

Proof. Let 

bij = il^i{xj). 

Let 

n 

II ^ I - X(^UiUi - r). 

i— 

By the method of Lagrange's muhipher, we have 

duk 

By simple symbolic computation of derivatives, we get 

r^^-2Aufc = 0, fc = l,2,...,n. 

j=i h 

Multiplying (|69p by Uk and taking the summation of index k, we get 

n m , 

^r(^^-2AufcK -0, fc = l,2,...,n. 

fc=i j=i h 

Interchanging the summations, we get 
j=i k=i h 
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(66) 
(67) 

(68) 
(69) 

(70) 

(71) 



From (gT]), (inni), (ini) and definition of Ij, we get 

Tm - 2Xr = 0, (72) 

and hence 

r= (2Ar)/m. (73) 
Substituting ([75]) into ([S^ . we get 

m , 

y2^-Uk = 0, fc=l,2,...,n. (74) 



Let u and bjbe n components vectors, where u = [mi,M2, and bj = 

[bij,b2j, ■■■,bnj]^ ■ By the constraint (|47|) and the assumption of this lemma, u 
and hj are not zero vectors. Rewrite equations (|74p 



-"J 
Let 



-V-^-u^O. (75) 
m ^ u • b,- 



ttfe = — ^ , A: = 1,2, ...,m. (76) 
u • bfc 



Substituting ^ into (US]), we get 



(77) 



Substituting ([77]) into ([7S|) . we obtain 



afc = — . fc = l,2,...,m. (78) 

^ E aj(bj - bfc) 



Let 



Aj = -(b. -b,), i,j = l,2,...,m. (79) 

m 

Substituting ^ into we obtain 

m 

^Dfe,afeaj = l, = l,2,...,m. (80) 

It should noticed that the major differences between ([7^ and ([50]) are the 
range of the indices since m and n are different. If m = 1, then the solution of 
equation ((80)) can be obtained. From ([TT]) . the lemma is proved. Fortunately, if 
m > 1, then we can solve equations (jSOp one by one. It is very simple to show 
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that the existence and the uniqueness of the solution of ([50]) . we will complete 
the details of works in the following lemmas and theorems. Now, we assume 
that the solution of (I5D1) can be obtained effectively and the solution is unique. 
Therefore, the lemma is proved and it seems that step (i), (ii) and (iii) can work 
well. 

In deriving the equations, the systematic notations are adopted. Therefore, 
the variables, ak and aj in the equations (|80p are interchangeable. In order 
to simplify the problem, these equations will be solved one by one in iteration 
procedures. The symmetry shall be destroyed because only one variable, ak, 
will be focused. Usually, there are at least two sets of variables in iteration 
procedures, one set is associated with the old value and the other set is associated 
with the updated new value. Therefore, we use the symbols with prime for new 
value. In order to analyze the details of algorithm, the delta notation shall 
be used, for example, a'j, = ak + Aa^. Therefore, there are different forms 
of equations ((80|) in different notations. The functions of different forms of 
equations ([50)1 are obvious because each form is associated with a meaning. 
The error of each equation is denoted hy Ej j = 1,2, ...,m, and AEj is the 
variation of Ej in iteration procedure. The total sum of the absolute value of 
Ej is denoted by E, and AE is the variation of E. E^^'> is the value of E in j*'' 
iteration.. These notations and their meanings shall be defined in the context. 

Remark 3: From identity |y6p and remark 2, a^ must he nonnegative. 
Clearly, the solution of |77[ ), u, shall satisfy the constraint, u ■ u — r. It is not 
necessary to worry about that the quantity u • in |y6p might be zero. The 
identity ^76^ and ^78^ are adopted for the convention of symbolic computations. 
These will be shown later. 

Though the likelihood function is highly nonlinear, equations in ([80]) are a 
system of quadratic equations. Intuitively, the solution of a single quadratic 
equation can be obtained easily. In order to solve the equations (|80p one by 
one, the nested iteration procedures are constructed. We write one of them, say 
A;*'' equation, the quadratic equation of a^, 

n 

Dkkal + D,ka^)ak -1 = 0. (81) 

Clearly, the only positive solution of (|8ip is (— s + ^/s'^ + 4Dkk)/ {'^Dkk), 
where s = ^ Dikat. If the equations in (I80p can be solved one by one, then 

i^k 

the problem becomes simpler. Indeed, the equations in (j80p can be solved one 
by one and the sum of all errors is reduced in each time. Basing on this fact, 
we are able to design another set of iteration procedures step (a), (b) and (c) 
to solve the problem. Now, we start to design the procedures. 
Let 

m 

Let 
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E = J2m. (83) 

4=1 

In the iteration procedures, the values of Ei and E shaU be changed. Let AEi 
be variation of Ei. Let AE be variation of E. In order to collaborate with the 
algorithm, the nested iteration procedures are designed in the step (ii) . Clearly, 
the problem is to minimize the value of E. And it must be proved that the 
minimum of E is zero. Therefore, the solution of (|5D|) and the solution of (|77p 
are obtained. Now, we construct the iteration procedures to complete step(ii). 
The associated mathematical lemmas and theorems of algorithm will emerge. 

First, initialize the procedure by setting ak — y^l/ [Dm), fc = 1,2, ...,m, where 

D is the maximum of Dij . Then the iteration procedures are: 

m m 

Step (a). Compute Ei — ^ DijaiOj — l,i = 1,2,..., to, and E = ^ \Ei\- 

3 = 1 i=l 

Go to step (b). 

Step (b). Test the condition whether E < 6 is satisfied or not, where 
S is a small positive number to control the termination of the procedures. If 
E < S, then the desired results are obtained. Compute Uk by the identity |77p , 
/e = 1,2, ...,n, and terminate the iteration. Otherwise, go to step (c). 

Step (c). Find the largest element of the set of all \Ei\. Suppose that the 
largest element is \Ek\ for some k. Eliminate Ek by updating the value of ak 
by a'f.,a'^. = (-s + ^Js'^ + ADkk)/{2Dkk), where s = J2 Dikai- Go to Step (a). 

i^k 

Now, there will be no difhculty to implement steps (a), (b) and (c). Intu- 
itively, steps (a), (b) and (c) shall be terminated in finite steps if the values 
of E is strictly decreasing sequence which converges to zero. In lemma 3, it 
will be proved that the values of £' is a decreasing sequence. Lemma 2 will 
support lemma 3. In lemma 5, it will be proved that the values of i? is a strictly 
decreasing sequence which converges to zero. Lemma 4 will support lemma 5. 

Lemma 2. All iterations, steps (a), (b) and (c), the set of all at, k = 
1,2..., m, are bounded above and the set of all ak, k — 1,2..., m, are bounded 
below by a positive number , say B, B > 0. That is, ak > B, k — 1, 2..., m. 

Remark 4. What we mean all a^ is including all ak and all a'^. 



Proof. The value of ak is either the initial value ^l/{Dm) or the up- 
dated value (— s -I- v? + 4i5fcfc)/(2D/c/c). It is very easy to verify the following 
inequalities 



^s + Vs'^ + 4Dkk ^ -s + ^s^ + 2sDkk + Dlk 1 



2Dhh 2Dkk 2' 



when s > 2. 



when s < 2. 



-s + Vs^ + "iDkk ^ V4 + 4Dkk 

TFi ' v°"^j 



2Dkk 2Dkk 
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It is obvious that ak are bounded above. Since s = ^ikOti, s is bounded 

above. Next, we are going to prove that there is a positive number B such 
that ak > B, k = 1,2..., m, in aU iterations. Clearly, ak are either the 
initial value or updated by (— s + Vs-^ + 4:Dkk)/{2Dkk)- The derivative of 
(-S + y/s'-^ + 4:Dkk)/i2Dkk) is ( -1 + s/Vs^ + ADkk)/{2Dkk), which is negative 
for all s > 0. Therefore, {—s + ^/s"^ + ADkk) / {"^Dkk) is a decreasing function of 
s. It is obvious that 



lim (-S + ^s^+ADkk)/{2Dkk) = 0. (86) 

s — >oo 

Since s is bounded above, (— s + y/s^^~{^ADkk) / {2Dkk) has a positive lower 
bound,. Therefore, ak is bounded below by a positive lower bound, say_B. 

Remark 5. Lemma 2 does not imply Uk are bounded below by a positive 
number, some Uk might tend to zero. 

Lemma 3. The values of E in iteration procedures, step (a), (b) and (c), 
is a decreasing sequence. 

Proof. From ((82|) and ([83|) . we find that equations (l80|) can be solved one by 
one. One of equations ((80|) with one variable, say a'f., will be solved. The error 
of the equation with index k, Ek, is removed completely in step (c). Therefore, 

lA^^fcl = l^^fcl (87) 
for the particular index k and it might be that 

\^E,\^\E,\ (88) 

when j ^ k. Though there are two roots of a quadratic equation, only one of 
them is positive. From equation ([8T|) . it must be (— s + Vs'^ + 'iDkk)/ {2Dkk)- 
Let 

Aak = a'k-ak. (89) 

The value of Aa^ is the difference of two positive numbers which are bounded 
above. Clearly, 

\Aak\<ak (90) 

when Aak < 0. In the step (c), the value of E is reduced by AE. In order to 
update the value of ak, we rewrite the equation ([8T|) 

m 

a'k J2 Dua^ + Dkka'ka'k -1 = 0. (91) 

i=jLk 

Some times, it is more convenient to use the delta notation. Therefore, 
equation (pij) becomes 

m 

[ak + Aak) Dk,a, + Dkk{ak + Aakf -1 = 0, (92) 

i^k 
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Aafc ^ Dkiai + AakDkkak + Dkki^ak)'^ + a/c ^ DkiUi -1 = 0. (93) 



From ([82]), we get 



AEk = Uk ^ Dkia^ - 1, 



i=l 



for this particular index k. 
Rewrite 



(94) 



Aafc Dk^a, + AakDkkOk + Dkk{Aakf = -{ak ^ Dk^a, - 1). (95) 



i=l 

Clearly, 



1=1 



Attfe ^ + AakDkkO-k + Dkk{Aak) 

m 

I Aafc I (^ ffciOi + Dkk^ak + DkkAuk) 

i^k 



\AEk 



= \AEk\ 



(96) 
(97) 



All quantities in (^ Dkidi + Dkk'^ak + DkkAak), except Aak, are positive. 
From (|5D)) . for any case. 



( y^ -Dfciaj + DkkCtk) 

i^k 



< 



( y^ -DfeiQi + Dkk^ak + DkkAak) 

i^k 



(98) 



Therefore, 



lAafcl 



( y^ -Dfciai + DkkCtk] 

i^k 



< \AEk. 



(99) 



If we write whole system of equations ([50]) . then the upper bound of all 
|A£^i|, i ^ k, can be figured out. From ([5^ . we get 



for all i. But 



Ai;, = (a, + Aa,)Y^ D,j{aj + Aa^) - 1 - E,. 
i=i 



Aai = 0, k. 



(100) 



(101) 
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From (fTUO)) and pUT]) . we get 

AE, = AakD.ka,, i ^ k. (102) 

Since 

D^k^Dk^, (103) 

AakDikai = AakDkiai. (104) 
From the inequality ([M)) and p02p . we get 

^|A£;,| + |Aafc|i^fcfcafe < |A£;fc| . (105) 

From (fT05)) . 

lAafcl Dkkak < \AEk\ - ^ \AE,\. (106) 

27^ A: 

From ([83|l. we get 

m 

A£;-;^A|i;,|. (107) 

i=l 

Since 

|a + fc| > |a| - |6| (108) 

for any a and 6, 

|Ai?| > |Ai?fc|-^|A£;,|. (109) 

Clearly, AE is negative and dominated by |Ai?fe|, 

\AE\>\Aak\ Dkkak. (110) 

An hence the set of the values of E generated by iterations is a decreasing 
sequence, We have proved the lemma. 

For each iteration, the value of E is denoted by a symbol, say E^"^^ in the 
i*^ iteration. The notations Ei and i?^*-' are associated with different meanings. 
Let limi^oo = E°°. Clearly, the lower bound of \Aak\ DkkCtK will serve 
for two purposes, one is to prove that the sequence i?^*) is a strictly decreasing 
sequence and the other is to prove that E°° — 0. 

Lemma 4. If E°° > and k is the index such that \Ek\ > \Ei\ i = 
1, 2, m, then the set of all |Aafc| DkkOtk, in all iterations of step (a), (b) and 
(c) has a nonzero lower bound. 

Proof. It is obvious that 
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\Ek\>E^/m. (Ill) 
In each iteration procedure, only one equation is solved. From (1971) and 



(fTTT|) . we get 



DkiUi + Dkk'^ak + Dkk^ak) 

i^k 



> 



E° 



The first term absorbing DkkO-k from the second term , we get 



Dkia^ + DkkCtk + Dkk^ak 



> 



m 



(112) 



(113) 



Since lAa^l and are bounded above, 
also bounded above, say 



Dkiai + DkkUk + Dkk^Oik 



IS 



Dkiai + DkkCtk + Dkk^ctk 



< M. 



From (fn^ and pTi|) . we get 



lAafel >S°°/(Mm). 



(114) 



(115) 



Therefore, | Aa^ |is bounded below by a positive number and hence \ Aak \ DkkCtk 
is bounded below by a positive number in all iterations. Therefore, we have 
proved the lemma. 

Lemma 5. hm = 0, that is, E"^ = 0. 

k — >oo 

Proof. For any e, e > 0, there is an positive integer N such that 

E^^'^ -e< E°°, 

whenever j > N. Since \Ek\ is the largest one in the j*''iteration, 

\Ek\>E°-/m. 

From inequality (jllOp . 

E'-^+^^ + \Aak\Dkkak<E'^^\ 

Therefore, 



E^^+^^ + \Aak\Dkkak - e < E'^^) _ ^. 



If we assume that 



E°° > 0. 



(116) 
(117) 
(118) 
(119) 
(120) 
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By lemma 4, the set of all \l^ak\DkkOik has a nonzero lower bound. We 
choose £ such that e is less than the lower bound of lAa/tj DkkCtk- That is, 



Then 



From (|116p . we get 



\Aak\Dkkak-e>0. (121) 
^(j+i) < _ £. (122) 



^0+1) < E"^. (123) 

It is a contradiction because i?*^^^ > for all j. Therefore, we have proved 
the lemma and hence E°° — 0. 

Since = 0, the iteration procedures, step (a), step (b) and step (c), 
should terminate in finite steps of iterations and step (ii) can be executed com- 
pletely. Therefore, lemma 1 is proved completely. In lemma 7, it will be proved 
that the iteration procedures, step (i), step(ii) and step (iii), shall be terminated 
in finite steps. Lemma 6 will support lemma 7. 

Lemma 6. Let 6 be obtained in the iteration procedures, step (i), step(ii) 
and step (iii). Then lim 6 — 1. 

k — *oo 

Proof. It is obvious that 

t > 1. (124) 

and hence is an increasing sequence. 
Let 

lim? = r°. (125) 
For any e > 0, there is / such that 

+ e>r^. (126) 

k 

If lim 6 does not exist, then there exist e > 0, for any K, there is k > K 

k — »oo 

such that 

t >1 + e. (127) 



From (|64p . ((65)l and the definition I, we get 



^+1 > (^o'^^m'jk^ (^28) 
where m is the sample size. Since 

(1 + £)" > 1 + 77ie, (129) 
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+ me)?. (130) 

Therefore, 

?+i>? + m£l^. (131) 
Choosing e = mel , we have 

?+i>? + e. (132) 

From (|126p . we get 

?+i>P°. (133) 
It is a contradiction. Therefore, 

lim e'' ^ 1. (134) 

k—^oo 

n 

Lemma 7. Let — ^ wf^f • Then lim — r and the iteration proce- 



dures, step (i), step(ii) and step (iii), shah be terminated in finite steps. 

—k 

Proof. From the definition of 9 in step (iii), we get 



n 

7:k-k ' 



(135) 

i=l 

From (fT34l) and (fTSS]) . we get 

Urn P'' = r. (136) 

k — >oo 

Therefore, the iteration procedures, step (i), step(ii) and step (iii), shaU be 
terminated in finite steps. 

Lemma 8 will show the resuh of theorem 1 can be obtained by constructive 
method. 

Lemma 8. Let w'^ = — |, i = 1,2, ...,n. A; = 1,2, be a set sequences 
generated by the iteration procedures, step (i), step(ii) and step (iii). Then 
lim = 0, i = 1, 2, n. 

k — ^oo 

n n n 

Remark 6. By Cauchy- Schwartz inequality, (^ u^vf)^ < ^ (uf )^ J2 ("^f)^; 

2—1 i—1 i—1 

the equal sign hold only if — v^,i — l,2,...,n. Intuitively, it is obvious that 
the condition in step (iii) must be satisfied. Otherwise, I is not bounded above 
and I does not have maximum. 

Proof. By simple computation, we get 

n n n n 

E - = E ^< + E -N' - 2 E (137) 
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In all iteration procedures, the constraints (|T7|) and must be satisfied. 
Therefore 



n 



Y,{u1-v^f^2r~2Y,u1vl (138) 



From (fnSl) and ([T55)) . we get 

lim = 0, i = l,2,...,n. (139) 

k — s-oo 

Combining the results theorem 1 and lemma 8, the problem of optimization 
is solved almost. 

The unique theorem 

Theorem 2. The solution of equations (|80p is unique. 

Proof: Of course, only the positive solutions make sense. Let e^ = aibj, 
where ai is a solution that we have obtained by the iteration procedures step (a), 
step(b) and step (c) . Let 



From dSni) and piO)) . we get 



(140) 



^ey=l, t^l,2...,m. (141) 
Consider the following equations, 

rn 

J2 "^^jP-Pj = 1' * = 1' (142) 

Here I3i, i — 1,2..., m, are unknowns. Then 

Pi^l, i=l,2...,m, (143) 
is a solution of p4ip . If the there is another solution set, say 

Pl>P2>,->(3m- (144) 

From (|14ip . we know that the equal sign can not hold all times. From p42p . 
we have 

m 

/3iEeijV3j = l, (145) 

and 
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It is obvious that 



e^, > 0, (147) 

for all i. Therefore, 

mm mm 

Pi eijPj >/?i eij/?™ =/3i/3„ ^ eij > ^ e^jf3j. (148) 

We have used the identities (|14ip at least two times. From (|142p . (|146p 
and (|148p . we find that it is a contradiction. We have completed the proof the 
theorem. 

Theorem 3. The solution of ([71]) is unique and hence the maximum value 

n 

obtained by step (ii) is the global maximum on the sphere ^ UiUi — r. 



Proof. For simplicity, we use ((75|) . the vector notations, instead of ([74]). If 
there are two solutions say u and u'. Therefore, 

j=i J 

And 

■m , 



TO 



Let 



Let 



afc = ^r-, fc = l,2,...,m. (151) 



4 = — ^,fc== 1,2,...,TO. (152) 



Substituting p?T|) into plS)) . we get 



"=;^E":'b,' (153) 



Substituting (fT52l) into (fTSO]) . we get 



"' = -E";b,- (154) 



Substituting (|153p into p5ip . we get 
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Qffc = , /c = 1,2, ...,m, (155) 



Substituting pM]) into we get 

1 



:E«'fe(b,-bfe) 



-, = l,2,...,m. (156) 



From (|79)) . (|155p and (|156p . we get two systems of equations 

m 

^ D/cjOffcaj = 1, fc = 1, 2, ...,TO, (157) 

and 

m 

^I^feja'fca^ = 1, A: = 1,2, ...,m. (158) 

By theorem 2. 

a'fc = a/c, (159) 
for fc = 1,2, ...,TO. From and we get 

u' = u. (160) 
And hence the maximum which is obtained in this algorithm is the gfobal 

n 

maximum on the manifold, the sphere ^ UiUi — r. 

i 

Numerical examples 

No matter how good might the paper be, the final result must be verified by 
numerical examples. There are three examples. The results are shown in Figure 
1, Figure 2 and Figure 3. The estimator is obtained by Bernstein polynomials 

[2], [3]. 
Let 

ipi{x) = Ni{n\/{i\{n~i)\))x\l-x)''-\ 0<x<l. (161) 
Here, Ni is a factor to make 

f-i 



ipi{x)dx=^l. (162) 







Let 



f{x) = ^Ciipi{x). (163) 



i=l 



23 



We use the density estimator which is defined in the very beginning identity 
(|6]) though it is computed by ((39)) . All the observations, xi, X2,- ■ ■ , Xm, must 
be contained in an interval [a, b] . It is a simple work to transform the interval 
[a, b] to the interval [0, 1]. 

Example 1. 

The density function is defined on [0, oo), 



f{x) = cxp{-x). 

Example 2. 

The density function is defined on [0,4], 



fix) = 2/3, 

when 1 < X < 2; 

f{x) = 1/3, 

when 3 < a; < 4; 

f{x) = 0, 

otherwise. 
Example 3. 

The density function is defined on [0,4], 



fix) = 1, 

when < a; < 1/2; 

fix) = 1/2, 

when 1 < a; < 3/2; 

fix) = 1/2, 

when 3 < X < 7/2; 

fix) = 0, 

otherwise. 

The density function of Example 2 and Example 3 are not continuous and 
hence it is inappropriate to apply Bernstein polynomial to these examples. If 
the piecewise spline is used, then the result shall be better actually. We will 
not discuss the piecewise Bernstein polynomial in this paper. Comparing with 
the existent method [11], [12] etc., the spline kernel or spline window is a new 
method with potential because there will be new useful splines that might be 
designed in near future. At least, there are three useful splines, B-spline, Cubic 



24 



spline and Bezier spline. The works of source program designing, debugging and 
maintaining are more difficult than the mathematical proofs because they are 
tedious works. More than four kernel functions or window functions are tested, 
including B-spline, overlap B-spline, Bezier spline and piecewise Bezier spline. 
Though we do not show the result of B-spline approach, most programs are 
tested by B-spline method first. We will not list the definition of B-spline be- 
cause it is available to find the definition of the splines in the books of numerical 
analysis. It seems that B-spline method can be taken as the priori in Bayesian 
approach and hence piecewise Bezier spline method can be taken as posterior in 
Bayesian approach. Unlike the Bezier spline, the B-spline need the extra con- 
trol points, the knot points [2], and these knot points make the programs more 
complicated and difhcult. In the testing program, there are about 300 window 
functions are used in B-spline method. It is a good experiment to solve about 
300 nonlinear equations. The whole work is accomplished by using the oldest 
fashion and the most modern language, visual fortran. If it is necessary, then 
the fortran source programs will be appended. 
Discussion and conclusion. 

The algorithm is so attractive that it is not necessary to prove that these 
sequences u*^, and u'j^vf, i ~ l,2,...,n converge. The results of computer 

n 

output show that these sequences uf, vf, wj^wf converge . Moreover, J2 '^i'^^i 

1=1 



is an increasing sequence and 9 is a decreasing sequence. The algorithm is to 
maximize likelihood function I and to terminate the procedures by the condition 

n 

^ UiVi > r — e. The constraints, ((47|) and (|48l) . are satisfied in every step. Since 
it has been proved that lim w*^ = 0, all \ui — Vi \ are very small when the iteration 

k — ^oo 

procedures are terminated. In this paper, we do not prove the convergence of 
the sequences u'^, vf and wf^vf, i = l,2,...,n. It should be reminded that 
the problem is to maximize the likelihood function subjected to the conditions 

n 

^ UiUi = r and Vi = Ui, i — 1,2, ...,n. We think that the problem is solved 

almost. It is still an open problem whether the iterations procedures, step (i), 
(ii) and (iii), will serve the purpose or not, for finding the global maximum of A7 
Of course, step (i) play important role for searching for the global maximum 
of A, it seems to be so. We think that only if the initial value of Vi in step 
(i) is set Vi ^ for all i, then the procedures will find the global maximum. 
But the proof is not completed yet. It is the unique theorems, theorem 2 and 
theorem 3, that simplify the complicated problem and gives us the motivation 
to prove the global property. Since the consistency of parametric estimator has 
been proved statistician [1], the only problem left is finding the point which will 
yield the global maximum of likelihood function. To the best knowledge of the 
authors, there is no definite answer for finding the global maximum of nonlinear 
optimization problems. Though the problem do not be solved completely in 
theory, the work and its related algorithm are very useful in practical problem. 
The proof of the lemma 8 is short and simple because this is the final version. 
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The first version is abandoned because it is lengthy and complicated. In the 
first version of the proof, we use the method of variation. The technique of the 
first proof in lemma 8 is almost the same as that of quantum physics, especially 
in quantum field theory and string theory [13]. 

To follow the approach of most nonpar ametric approaches, we use the ad- 
vantage the probability amplitude which is introduced in the quantum the- 
ory. Though the orthogonal polynomials arc also used in both nonparametric 
approaches and quantum theory, we use the Bernstein polynomial. It is the 
Bersnstein polynomials that unify and simplify fundamental problems such as 
parametric approach and nonparametric approach, consistency of the estimator 
and the most difficult problem of density estimation, the nonlinear optimization 
problem. The probability amplitude is stressed most books of quantum physics 
[14]. 

It should be clarified that the research work is initiated and completed fi- 
nally by Yeong-Shyeong Tsai. Without the consultation with Lu-Hsing Tsai, 
Hung-Ming Tsai and Po-Yu Tsai in quantum physics and personal computing 
system, and the consultation with Yin-Lin Hsu in statistics, the paper can not 
be completed. 

Allow us to discuss more mathematics. Since quantum theory is built on the 
Hilbert space, the physicists use the complete sets of the space. Therefore, the 
statisticians working on nonparametric approach use the same tool as physicists. 
In order to avoid the roughness introduced by the complete sets, we use the result 
of Stone- Weirstrass theorem. If it is necessary, then we will treat the space 
of continuous functions or measurable functions as metric space or topological 
space. Therefore, we use countable dense subset of the space, the set of Bernstein 
polynomials. 
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Figure 1: The density function is exp(-x). The sample size is 80. There are 11 
windows of Bezier spline, n=10. 
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Figure 2: The density function is bimodal. The domain is [0,4]. f(x)=2/3 when 
X is in [1,2]; f(x)=l/3 when x is in [3,4]; f(x)=0, otherwise. The sample size is 
180. There are 35 windows of Bezier spline. 
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Figure 3: The density function is trimodal.The domain is [0,4]. f(x)=l when x 
is in [0,1/2]; f(x)=l/2 when x is in [1,3/2]; f(x)=l/2 when x is in [3,7/2]; f(x)=0, 
otherwise. The sample size is 180. There are 35 windows of Bezier spUne. 
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